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ABSTRACT 

An  algorithm  is  presented  for  the  design  of  systolic 
arrays  that  implement  single-input  single-output  time- 
invariant  digital  niters  The  algorithm  is  then  specialized 
to  the  case  where  the  realized  array  consists  only  of 
orthogonal  rotational  modules  and  delay  elements  inter¬ 
-connected  in  such  a  manner  as  to  render  the  circuit 

pipelineable.  (i - - 

1.  INTRODUCTION 

Consider  a  single-input  single-output  linear  time- 
invariant  discrete-time  system  that  allows  a  transfer 
function  description: 

2*,*"-'* 
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The  system  is  said  to  be  strictly  stable  if  the  function. 
H{z).  in  the  complex  vanable,  z.  is  analytic  on  the  unit 
circle,  II,  and  in  the  regjon.  E.  where.  '  z  i  >  1.  In  other 
words,  the  zeros  of  the  polynomial,  D[z)  should  lie  in  the 
region  D.  where,  I  z  I  <  1.  k  strictly  stable  function  is  said 
to  be  a  Schur  function  i  it  has  magnitude  less  than  or 
equal  to  unity  on  II. 

In  this  paper,  the  transpose  of  a  matrix  is  denoted  by  the 
superscript.  T,  the  complex  conjugate  transpose  by  an 
over-bar,  and  the  para-Hermitian  conjugate  by  the  sub¬ 
script  ».  The  para-Hermitian  conjugate  of  a  matrix  func¬ 
tion  is,  of  course,  defined  as. 

A.(z)  =  A{ \/i) 

Given  a  transfer  function.  H(z),  the  synthesis  problem  is 
to  obtain  a  hardware  implementation  of  H(z).  The  qual¬ 
ity  of  such  an  implementation  using  VLSI  technology  is 
then  dependent  upon  the  following  criteria: 

(1)  Modularity :  The  circuit  should  preferably  be  a  regu¬ 
lar  interconnection  of  similar  processing  elements 

(2)  Pipeline  ability:  The  maximum  throughput  achiev¬ 
able  should  be  independent  of  the  number  of  pro¬ 
cessing  elements,  n,  in  the  array 

(3)  Uniformity.  It  is  desirable  to  design  VLSI  circuits 
that  can  implement  digital  filters  with  a  variety  of 
transfer  functions  of  different  orders  by  merely 
changing  certain  parameters  of  the  circuit 
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(4)  Communic ation:  Communication  between  the  pro¬ 
cessing  elements  should  be  restricted  to  immediate 
neighbours.  A  circuit  that  is  modular  and  has  only 
nearest-neighbour  links  is  said  to  be  systotic  [2]. 

(5)  Numerical  Stability:  There  are  certain  undesirable 
phenomena,  e.g.,  limit-cycle  and  overflow  oscilla¬ 
tions.  associated  with  the  finite-precision  implemen¬ 
tation  of  linear  systems  that  have  been  well-studied 
in  the  literature[3].  The  circuit  should  be  free  of 
them.  It  should  also  be  insensitive  to  slight  varia¬ 
tions  in  the  multiplicative  parameters  in  the  imple¬ 
mentation. 

In  the  next  section,  a  systolic  array  configuration  is  pro¬ 
posed  for  the  circuit  that,  in  general,  satisfies  properties 
(1).(3).(4).  The  pipelineability  or  such  an  array  is  con¬ 
sidered  and  it  will  be  shown  that  every  strictly  stable 
transfer  function  can  be  realized  in  several  different 
ways  that  ensure  that  all  the  above  desirable  criteria  are 
met. 

2.  BI-DIRECTIONAL  LINEAR  SYSTOLIC  ARRAYS 
A  systolic  array  is  said  to  be  bi-directional  if  every  pro¬ 
cessing  element  receives  input  from  and  provides  output 
to  each  of  its  neighbours.  (The  external  environment  is  a 
neighbour  to  the  processing  elements  at  the  corners  of 
the  array.)  The  array  is  said  to  be  linear  in  operation  if 
each  processing  element  performs  only  linear  operations 
on  its  inputs.  It  is  said  to  be  linear  in  configuration  if 
each  processing  element  has  only  two  neighbours.  A  sys¬ 
tolic  array  that  is  linear  both  in  operation  and  in 
configuration  is  linear.  Such  an  array  is  time-invariant 
if  the  z -transform  of  the  vector  output  sequence,  Yx(z), 

,  of  its  i,h  processing  element  is  related  to  the  z- 
transform  of  its  corresponding  vector  input  sequence. 
Ut{z),  by: 

n(*)  =  e,<*)a(*)  (2) 

where  E,(z )  is  some  rational  matrix  function  of  z.  neces¬ 
sarily  square,  such  that  £,(“)  has  only  finite  elements,  in 
order  to  ensure  causality.  A  linear  systolic  array  is  said 
to  have  a  ternary  signature  if  it  has  either  (a)  two  wires 
flowing  from  right  to  loft  and  orsw  from  loft  to  right,  or 

(b)  vice-versa.  (See  Fig.  1 ) 

The  intent  of  this  paper  is  to  present  a  general  frame¬ 
work  for  designing  such  arrays  so  that  when  they  are 
suitably  terminated  at  both  ends  with  constant  multi¬ 
plier  interconnections,  they  are  realizations  for  H(z). 
The  reasons  for  choosing  ternary  signature  arrays  as 
opposed  to  binary  signature  arrays  are  the  following: 

A  pipelined  orthogonal  realization  of  H(z)  in  a  bmarv 
signature  array  requires,  in  general,  at  least  2n  delay 
elements  and  (3n  +  1)  elementary  Givens  rotational 
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modules  for  its  implementation.(See  [1]).  On  the  other 
hand,  ternary  signature  arrays  with  n  delay  elements 
and  (2 n  +  1)  elementary  Givens  rotational  modules  will 
be  derived  in  what  follows.  Another  reason  is  that  even  if 
we  synthesize  the  filter  in  a  binary  signature  array  with 
the  attendant  increase  in  hardware  requirement,  the 
speed  at  which  the  array  can  process  the  input  sequence 
is  only  half  the  speed  that  is  achievable  using  ternary 
signature  arrays.  Clearly,  there  is  no  tradeoff  here. 
Consider  a  linear  time-invariant  systolic  array  with  a  ter¬ 
nary  signature  as  shown  in  Fig.l.  Suppose  that  for  some 
unknown  constant  termination  at  one  end  of  the  array, 
say  the  right  end.  the  input  to  output  transfer  at  the  left 
end  is  given  by:  (with  reference  to  Fig.l.) 


The  function.  ip{z),  will  be  referred  to  as  the  generating 
function  of  the  systolic  array,  for  reasons  that  will 
become  clear  in  what  follows.  The  degree  of  the  function. 
jp(z).  is  the  polynomial-degree  of  Q(z).  assuming  that 
the  greatest  common  divisor  of  4>(z),  P(z).  P(z)  is  con¬ 
stant. 

For  a  given  transfer  function.  H{z),  there  are  several 
possible  choices  of  <p(z).  Suppose  that  the  desig  vr 
wishes  to  use 

w(a)  =  *i«oi(»)  +  (4a) 

y(*)  =  Vos(*)  (4b) 

such  that  the  transfer  function  from  u(z)  to  y(e)  is 
H(z).  Then.  tf(z)  should  be  such  that  P(z  ).C(z )./?(* ) 
are  polynomials  satisfying  the  relationship: 

P(z)  =  N(z )  (5a) 

kM*)  *  kzP(*)  ~  D(z)  (5b) 

Clearly,  the  choice  of  P(z)  and  Q(z)  is  not  unique.  More¬ 
over.  the  designer  might  wish  to  use  different  relation¬ 
ships  in  eqn.(4)  in  order  to  obtain  the  desired  input  and 
output  terminals  and  the  nonuniqueness  is  thus  further 
enhanced. 

We  will  get  back  to  this  problem  of  the  choice  of  the  gen¬ 
erating  function  later.  For  the  present,  assume  that  ?(z  ) 
is  known. 

The  realization  procedure,  then,  depends  upon  the  solu¬ 
tion  to  the  following  problem: 

Civen  the  (2x1)  vector  generating  function.  p(z),  of 
degree,  n,  obtain  a  linear  time-invariant  systolic 
array  with  n  processing  elements  and  the  constant 
termination  at  the  right  end  of  the  array  in  Fig.l. 
such  that  the  input  to  output  transfer  at  the  left 


end  of  the  array  is  ?(* ). 

The  following  algorithm  solves  the  above  problem: 


The  Systolic  Realization  Algorithm: 

It  is  required  that  the  systolic  array  in  Fig.l  with  the  t"* 
processing  element  defined  by: 

jyu  Vie  y.s]  =  E«(*)|uii  “iz  u>sj  (6) 

is  such  that  when  the  right  end  of  the  array  is  ter¬ 
minated  with 

^Z(«-D.z  “(n-D.sj  =  V»(*)y;«-i>.i  (7) 

the  signals  at  the  left  end  of  the  array  are  related  by: 

jy oz  ycs]r  s  p(*)«C|(*)  (B) 

and  that  (S„(z)  in  eqn.(7)  is  a  constant  vector. 

Step  1:  Initialization:  Let  fc(z)  =  f(z). 

For  i  =  0. 1 . n-1.  do 


Step  2:  Choose  the  complex  scalar  constants.  the 

(2x2)  nonaingular  constant  matrix.  Cx  and  tha  (1x2) 

constant  vector,  v4r,  and  obtain  (we  will  restrict  these 
choices  later  in  order  to  ensure  that  the  array  is  pipe- 
lineable  and  orthogonal.  For  the  present,  assume  that 
they  are  free  parameters  subject  to  the  designer's  fancy 
and  the  equations  below.) 

Wim(*)  =  'tt  7  - - 7~TrP>(z>  (9a) 

*»!*.(* )  -  ^.(a.)) 


such  that  the  (3x3)  constant  matrix.  0,.  defined  by: 


0. 

is  nonsingular, 


[-u.rp,(a,)  v,T] 

| -c.sp.Oj.)  c;J 


(9b) 


a,  (9c) 

and 

?x(Pt)  *  -  (9d) 

and  p,(z )  is  chosen  to  be: 

Px(z)  =  ; - -r-.  Q..0,  *  -•  (9e) 

*  -  Pi 

P,(*)  =  z  -  a,.  (?,  =  a,  *  -  (9f) 

Step  3.  Form 

r/,i  .  W'(z)  o]  -  ,,(/?,))  vfC,-'} 

E,U)  ■  [  0  'll  *w.)  j  ‘3 
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and. 

Comments:  The  constraint  (9d)  imposed  in  step  2  can  be 
relaxed  in  certain  cases.  Details  can  be  found  in  [1].  Con¬ 
straint  (9b)  is  necessary  as  shown  below,  and  (9c)  is 
required  to  ensure  causality. 


Proof  of  tha  algorithm:  Using  eqn.(10)  in  eqn.(6).  it  can 
be  shown,  after  some  simple  algebra,  that 


y.i(*> 

*-•<*>  o  o 

0 

1  0 

9, 

y«2U) 

u»a(* )] 

0 

0  1 

y.s(z ) 

Now.  if. 

J$:j]  =  os) 

Le..  9i(z)  is  the  generating  function  for  the  i01  sub¬ 
array  that  is  the  cascade  of  processing  elements  that 
are  indexed  i  or  greater  in  Fig.  1..  then,  by  substituting 
eqn.(l2)  in  eqn.(ll),  using  the  defining  relationship  in 
eqn.(9a)  and  the  interconnecting  equations  displayed  in 
Fig.l.  we  have  -j,M(z)  is  the  generating  function  for  the 
(i  +  l)‘A  sub-array. 

The  assumption  that  0,  is  invertible  ensures  that  the 
converse  of  this  statement  is  true.  i.e..  if  ^1M(z)  is  the 
generating  function  for  the  (i  +  l)“*  sub-array  and  if  the 
i"1  processing  element  has  a  transfer  defined  by 
eqns(6.10).  then  «\(z)  is  the  generating  function  for  the 
i‘A  sub-array.  (The  invertibility  of  0.  is  necessary  in 
order  to  state  this  [l].)  Hence,  if  the  right  end  of  the 
array  in  Fig.l  is  terminated  according  to  eqn. (7),  then 
the  input  to  output  transfer  function  at  the  left  end  of 
the  array  is  f(z)  as  desired. 

It  is  easy  to  check  that  if  ?,(z)  is  of  degree,  n,.  then. 
V>t.,(z)  is  of  degree  rq-1  at  most.  It  has  been  proved  m 

[1]  that  the  degree  is  exactly  n,-t.  Hence,  9„{z)  is  a 
constant  vector. 

Comments:  A  remarkable  property  of  the  systolic  reali¬ 
zation  algorithm  is  that  almost  every  known  canonical 
realization  of  digital  filters,  e.g..  controller,  controllabil¬ 
ity.  cascade,  parallel  [5),  all  the  different  versions  of  the 
Gray-Markel  lattice  filter[6],  wave-digital  translations  of 
passive  analog  Foster  and  Cauer  canonical  forms  [7],  the 
orthogonal  digital  filter  realizations  of  Deprettere  et 
al[9],  to  name  a  few.  can  be  obtained  by  certain  choices 
of  the  various  parameters  used  in  the  algorithm.  More¬ 
over.  for  every  realization  obtained  using  the  systolic 
realization  algorithm,  there  is  a  dual  realization  that  can 
be  obtained  using  the  duality  theorem  in  [  1  ].  e  g.,  the 
observer  form  realization  is  dual  to  the  control. er  form 
realization,  etc. 

Properties  of  The  Systolic  Realization  Algorithm: 

Some  relevant  properties  of  the  systolic  realization  algo¬ 
rithm  are  given  below.  A  detailed  discussion  of  these  and 
other  properties  together  with  their  proofs  can  be  found 
in  [1]. 

(1)  The  systolic  array  obtained  by  the  algorithm  is  in 
general  a  noncompu.  table  cascade.  However,  it  can 
always  be  made  computable  by  executing  b.  linear 
transformations  on  the  functions.  fx(z)-  ,lt  cat*:.  *tcp 
the  algorithm. 

(2)  The  array  leads  to  a  compilable  cascade  as  .t  if 


=0;  0t  ~ 


(13) 


This  resuits  m  />,(* )  =  z  for  ail  t. 


(3)  Pipe  line  ability:  The  array  leads  to  a  pipeline  able 
implementation  if  a*.  f3x  are  chosen  as  in  eqn.(13)  [1.9]. 
The  throughput  is  maximum,  all  other  factors  being  the 
same,  if  and  only  if  this  choice  is  adhered  to[lJ.  Such 
arrays  will  be  referred  to  as  completely  pipelineable 
arrays  in  what  follows. 

3.  COMPLETELY  PIPELINEABLE  ORTHOGONAL  ARRAYS 
The  systolic  array  in  Fig-1  is  said  to  be  orthogonal  if 
every  processing  element  can  be  implemented  as  an 
interconnection  of  orthogonal  rotational  modules  and 
blocks.  If.  in  addition,  the  array  is  completely  pipe* 
lineable,  it  is  a  completely  pipelineable.  orthogonal 
array.  It  has  been  shown  in  [l]  that  such  arrays  have 
very  good  numerical  properties  and  therefore,  satisfy  all 
the  desirable  criteria  mentioned  above  for  VLSI  imple¬ 
mentation. 

A  necessary  and  sufficient  condition  for  a  completely 
pipelineable  array  to  be  orthogonal  is  that  the  constant 
matrix,  I\.  defined  by: 


£,(z)  s  efiojfz'1,  1.  l]D,  (14) 

is  an  orthogonal  matrix  for  all  i.  Using  the  definition  of 
orthogonality  and  eqn.  (10),  this  reduces  to  the  condi¬ 
tions: 


C.Q  =  [/  -  V.(”*)(S.('”))-'  (15a) 

=  »>,(”)?, (”)(1  -  ?i(“)s*i(*))"‘  (15b) 

?,<  0)P,(~)  =  1  (15c) 

Eqn.  (15)  is  solvable  if  and  only  if  !?,(*)'<  1  and 
eqn.(15c)  is  true  for  ?,(z).  The  systolic  realization  algo¬ 
rithm  then  reduces  to  the  Schur  algonthmllOJ. 


The  question  that  now  requires  to  be  addressed  is  the  fol¬ 
lowing:  for  what  choices  of  the  generating  function,  can 
the  above  equations  be  solve d  for  all  i0.  Theorem  1, 
stated  below,  a  special  case  of  which  is  originally  due  to 
Schur(  10J.  clarifies  the  situation: 


Theorem  1:  A  completely  pipelineable  orthogonal  array 
exists  for  any  generating  function.  ?(z  ).  if  and  only  if: 

[a]  ip(z  )  Is  composed  of  scalar  rational  Schur  functions 
and 


[b] 


<f(z)  satisfies  the  equation: 

l  -  tf.(z)t(z)  = 


o2 

Q^)Q(z) 


(16) 


where  a  is  some  real  constant,  and  Q(t)  is  the  poly¬ 
nomial  defined  in  eqn. (3). 


In  addition,  if  in  eqn. (16).  o  is  nonzero,  then  the  termina¬ 
tion.  pn.  IS  such  that  rn  <  1*  Else.  *  1. 

The  proof  for  the  abovp  theorem  can  be  found  in  [l], 
wherein  a  more  general  theorem  is  stated  that  defines 
the  conditions  for  the  existence  of  orthogonal  arrays  as 
such. 


Choice  of  the  generating  function: 

Civen  a  (2x1)  function.  ;(z).  that  satisfies  eqn. (16).  and 
the  restrictive  choices  of  the  parameters,  Cx.  0X  if. 
as  defined  in  eqn.  113).  the  systolic  realization  algorithm 
yields  a  completely  pipelineable  orthogonal  array  The 
next  step  is  to  obtain  this  generating  function  x(z  ).  from 
the  given  scalar  transfer  function.  H{z)  Again,  there  are 
several  ways  m  which  this  choice  can  be  made. 

(1) Firstlv.  the  terminating  conditions  at  the  left  »*nd  of 
the  array  have  to  be  c  nosrn. 

(2) Sccomilv.  the  constant  o  in  eqn. (16)  has  to  be  cho<en. 


•  * 


•  4 


•  < 


» 


» 


for  all  i 


The  restrictions  for  making  these  choices  and  the  gen¬ 
eral  properties  of  the  resulting  arrays  can  be  found  in 
£  1],  Tor  the  present,  suppose  the  terminating  conditions 
are  given  as  in  eqn.(4,5)  and  that  the  constant  a  =  0  in 
eqn.(18).  Even  then,  there  are  two  constant  parameters, 
available  to  the  designer,  and  only  two  particular 
choices  of  these  parameters  are  considered  below: 

(1)  The  Direct  Embedding : 

Suppose  fc2  =  0.  Then.  P(z  ),/?(*)  have  to  satisfy  the 
relationships: 

R(z)  =  N(z)  (17a) 

«(i)  =  D(z)/kt  (17b) 

P(z)P.(.z )  =  D(z)D.(z)/kf  -  N(z)N.{z)  (17c) 

From  the  well-known  spectral  factorization  theorem,  it 
can  be  derived  that  eqn.(l7c)  is  solvable  if  and  only  if 

for  all  0  «  w  £  2n  (I7d) 

The  resulting  pipelined  orthogonal  array  together  with 
some  simulation  results  have  been  discussed  in  depth  in 
[11].  wherein  it  has  also  been  shown  that  the  solution  to 
eqn.(17c)  is  trivially  obtainable  when  H(z)  is  a  Butter- 
worth.  or  a  Tchebycheff.  or  an  Elliptic  function,  for 
*,  =  1. 

(2)  The  Darling  tan  Embedding: 

Consider  the  choice.  Jfc,  =  k2  -  1.  Then. 

rt(*)  =  Af(*)  (lfia) 

Q(z )  +  P(z)  -  D(z )  (18b) 

then,  if  <?(z).  P(z)  are  written  as: 

0(z)  =  (D(z)  *C(z))/ 2  (19c) 

P(z)  =  (D(z)  -  C(z ))/  2  (19d) 

then 

C(z)D.(z)  +  C.{z)D(z)  =  ZX(z):/.(z)  (I9e) 

Eqn.(lB)  is  precisely  the  one  used  by  Deprettere  et  al(9) 
for  obtaining  the  generating  function  for  their  binary  sig¬ 
nature  orthogonal  array.  The  systolic  realization  algo¬ 
rithm  is  then  analogous  to  the  Darlington  synthesis  pro¬ 
cedure  for  passive  analog  filters. 

For  this  termination,  however,  the  designer  has  to  be 
careful  to  ensure  that  the  resulting  structure  is  comput¬ 
able,  by  scaling  the  transfer  function  such  that 

/?(«)/ 9(”)  =  o. 

A  general  methodology  for  obtaining  the  generating  func¬ 
tion  so  that  the  resulting  array  is  orthogonal  is  available 

•n  [1]. 

Discussion:  In  eqn.(l5),  the  matrix.  C,  can  be  cho'en  to 
be  lower  or  upper  triangular,  in  which  case,  the  process¬ 
ing  element  can  be  implemented  using  two  elementary 
Givens  rotational  modules[l]. 

In  [  1  ].  a  processor  configuration  termed  the  Universal 
Schur  Ultcr  has  been  proposed  based  upon  the  results 
presented  here,  that  is  capable  of  implementing  lime- 
invariant  systems  of  arbitrary  order,  if  the  processing 
elements  have  sufficient  memory.  The  arithmetic  unit  of 
the  processing  element  implements  the  CORDIC  algo¬ 
rithm^]  and  the  logical  unit  consists  of  a  few  counters 
and  gates.  The  operation  of  the  circuit  is  assumed  to  be 
asynchronous,  with  the  necessary  handshaking  provided 
using  binary  semaphores.  A  description  of  the  processing 
element  in  Concurrent  Pascal  is  given.  A  VLSI  circuit 
that  has  at  least  one  such  processing  element  and  has 


access  to  sufficient  memory  is  capable  of  implementing 
any  strictly  stable  linear  system  in  such  a  manner  that 
the  circuit  is  free  of  limit-cycle  and  overflow  oscillations 
and  has  low  coefficient  sensitivity.  The  throughput  of  the 
circuit  is  then  directly  proportional  to  the  number  of 
processing  elements  in  the  circuit,  provided  this  number 
is  an  integer  submultipie  of  n.  The  throughput  is  not 
influenced  by  having  more  than  n  processing  elements 
in  the  circuit.  The  circuit  is  also  capable  of  implementing 
vector  transfer  functions  of  arbitrary  order. 

CONCLUSIONS 

In  this  paper,  the  systolic  realization  algorithm  for  syn¬ 
thesizing  digital  filters  on  a  VLSI  circuit  has  been 
presented.  The  problem  of  implementing  digital  filters 
on  a  systolic  array  in  a  pipelineable  fashion  has  been 
solved  in  a  numerically  robust  manner.  For  a  more  com¬ 
plete  treatment  of  the  VLSI  synthesis  of  digital  filters, 
the  interested  reader  is  referred  to  [l]. 
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